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Abstract 

Recent results established exponential lower bounds for the length of any 
Resolution proof for the weak pigeonhole principle. More formally, it was 
proved that any Resolution proof for the weak pigeonhole principle, with n 
holes and any number of pigeons, is of length £l(2 n ), (for a constant e = 
1/3). One corollary is that certain propositional formulations of the statement 
P 7^ NP do not have short Resolution proofs. After a short introduction 
to the problem of P ^ NP and to the research area of propositional proof 
complexity, I will discuss the above mentioned lower bounds for the weak 
pigeonhole principle and the connections to the hardness of proving P ^ NP. 
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1. Propositional logic 

The basic syntactic units (atoms) of propositional logic are Boolean variables 
x\,...,x n € {0,1}, where the value represents False and the value 1 represents 
True. The propositional variables are combined with standard Boolean gates (also 
called connectives), such as, AND (conjunction), OR (disjunction), and NOT (nega- 
tion), to form Boolean formulas. Recall that in propositional logic there are no 
quantifiers. 

A literal is either an atom (i.e., a variable Xi) or the negation of an atom (i.e., 
-iXj). A clause is a disjunction of literals. A term is a conjunction of literals. A 
formula / is in conjunctive-normal-form (CNF) if it is a conjunction of clauses. A 
formula / is in disjunctive-normal-form (DNF) if it is a disjunction of terms. Since 
there are standard ways to transform a formula to CNF or DNF (by adding new 
variables), many times we limit the discussion to CNF formulas or DNF formulas. 
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A Boolean formula f{x\,...,x n ) is a tautology if f(xi, x n ) = 1 for every 
x\,...,x n . A Boolean formula f(xi,...,x n ) is unsatisfiable if f(xi,...,x n ) = for 
every x\, ...,x n . Obviously, / is a tautology if and only if -if is unsatisfiable. 

Given a formula f(x\, ...,x n ), one can decide whether or not / is a tautology 
by checking all the possibilities for assignments to x\,...,x n . However, the time 
needed for this procedure is exponential in the number of variables, and hence may 
be exponential in the length of the formula /. 

2. P / NP 

P NP is the central open problem in complexity theory and one of the 
most important open problems in mathematics today. The problem has thousands 
of equivalent formulations. One of these formulations is the following: 

Is there a polynomial time algorithm A that gets as input a 
Boolean formula / and outputs 1 if and only if / is a tautology ? 

P ^ NP states that there is no such algorithm. 

A related open problem in complexity theory is the problem of NP ^ Co—NP. 
The problem can be stated as follows: 

Is there a polynomial time algorithm A that gets as input a 
Boolean formula / and a string z, and such that: / is a tautol- 
ogy if and only if there exists z s.t.: 

1. The length of z is at most polynomial in the length of /. 

2. A(f,z) = l. 

NP ^Co- NP states that there is no such algorithm. Obviously, NP ^Co- NP 
implies P ^ NP. 

It is widely believed that P ^ NP (and NP ^ Co - NP). At this point, 
however, we are still far from giving a solution for these problems. It is not clear 
why these problems are so hard to solve. 

3. Propositional proof theory 

Propositional proof theory is the study of the length of proofs for different 
tautologies in different propositional proof systems. 

The notion of propositional proof system was introduced by Cook and Reckhow 
in 1973, as a direction for proving NP ^ co — NP (and hence also P ^ NP) [6]. 
A propositional proof system is a polynomial time algorithm A(f, z) such that a 
Boolean formula / is a tautology if and only if there exists z such that A(f, z) = 1 
(note that we do not require here that the length of z is at most polynomial in the 
length of /). We think of the string z as a proof for / in the proof system A. We 
say that a tautology / is hard for a proof system A if any proof z for / in the proof 
system A is of length super-polynomial in the length of /. 

Many times we prefer to talk about unsatisfiable formulas, rather than tau- 
tologies, and about refutation systems, rather than proof systems. A propositional 
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refutation system is a polynomial time algorithm A(f, z) such that a Boolean for- 
mula / is unsatisfiable if and only if there exists z such that A(f, z) = 1. We think 
of the string z as a refutation for / in the refutation system A. We think of a 
refutation z for / also as a proof for ->/ (and vice versa). 

It is easy to see that NP ^ co — NP if and only if for every propositional proof 
system A there exists a hard tautology, that is, a tautology / with no short proofs. 
It was hence suggested by Cook and Reckhow to study the length of proofs for 
different tautologies in stronger and stronger propositional proof systems. It turns 
out that in many cases these problems are very interesting in their own right and 
are related to many other interesting problems in complexity theory and in logic, in 
particular when the tautology / represents a fundamental mathematical principle. 

For a recent survey on the main research directions in propositional proof 
theory, see [2]. 

4. Resolution 

Resolution is one of the simplest and most widely studied propositional proof 
systems. Besides its mathematical simplicity and elegance, Resolution is a very 
interesting proof system also because it generalizes the Davis-Putnam procedure 
and several other well known proof-search procedures. Moreover, Resolution is the 
base for most automat theorem provers existing today. 

The Resolution rule says that if C and D are two clauses and variable 
then any assignment (to the variables xi, ...,x n ) that satisfies both of the clauses, 
C V Xi and D V ^Xi, also satisfies the clause C V D. The clause C V D is called the 
resolvent of the clauses C V Xi and D V ~^Xi on the variable Xi . 

Resolution is usually presented as a propositional refutation system for CNF 
formulas. Since there are standard ways to transform a formula to CNF (by adding 
new variables), this presentation is general enough. A Resolution refutation for a 
CNF formula / is a sequence of clauses C\, C2, ■ ■ ■ , C s , such that: 

1. Each clause Cj is either a clause of / or a resolvent of two previous clauses in 
the sequence. 

2. The last clause, C s , is the empty clause. 

We think of the empty clause as a clause that has no satisfying assignments, and 
hence a contradiction was obtained. 

We think of a Resolution refutation for / also as a proof for -1/. Without loss 
of generality, we assume that no clause in a Resolution proof contains both x t and 
->Xi (such a clause is always satisfied and hence it can be removed from the proof). 
The length, or size, of a Resolution proof is the number of clauses in it. 

We can represent a Resolution proof as an acyclic directed graph on vertices 
Ci, . . . , C s , where each clause of / has out-degree 0, and any other clause has two 
edges pointing to the two clauses that were used to produce it. 

It is well known that Resolution is a refutation system. That is, a CNF formula 
/ is unsatisfiable if and only if there exists a Resolution refutation for /. A well- 
known and widely studied restricted version of Resolution (that is still a refutation 
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system) is called Regular Resolution. In a Regular Resolution refutation, along any 
path in the directed acyclic graph, each variable is resolved upon at most once. 

5. Resolution as a search problem 

As mentioned above, we represent a Resolution proof as an acyclic directed 
graph G on the vertices C\,...,C S . In this graph, each clause Cj which is an 
original clause of / has out-degree 0, and any other clause has two edges pointing 
to the two clauses that were used to produce it. We call the vertices of out-degree 
(i.e., the clauses that are original clauses of /) the leaves of the graph. Without 
loss of generality, we can assume that the only clause with in-degree is the last 
clause C s (as we can just remove any other clause with in-degree 0). We call the 
vertex C s the root of the graph. 

We label each vertex Cj in the graph by the variable Xi that was used to derive 
it (i.e., the variable Xi that was resolved upon), unless the clause Cj is an original 
clause of / (and then Cj is not labelled) . If a clause Cj is labelled by a variable Xi 
we label the two edges going out from Cj by and 1, where the edge pointing to 
the clause that contains Xi is labelled by 0, and the edge pointing to the clause that 
contains -iXj is labelled by 1. That is, if the clause CVD was derived from the two 
clauses C V Xi and D V ~^Xi then the vertex C V D is labelled by Xi, the edge from 
the vertex C V D to the vertex C V Xi is labelled by and the edge from the vertex 
CVOto the vertex D V is labelled by 1 . For a non-leaf node u of the graph G, 
define, 

Label(u) = the variable labelling u. 

We think of Label(u) as a variable queried at the node u. 

Let p be a path on G, starting from the root. Note that along a path p, a 
variable a;, may appear (as a label of a node u) more than once. We say that the 
path p evaluates Xi to if Xi = Label(u) for some node u on the path p, and after 
the last appearance of Xi as Label(u) (of a node u on the path) the path p continues 
on the edge labelled by (i.e., if u is the last node on p such that Xi — Label(u) 
then p contains the edge labelled by that goes out from u) . In the same way, we 
say that the path p evaluates a;, to 1 if Xi = Label (u) for some node u on the path 
p, and after the last appearance of Xi as Label{u) (of a node u on the path) the 
path p continues on the edge labelled by 1 (i.e., if u is the last node on p such that 
x^ = Label(u) then p contains the edge labelled by 1 that goes out from u). 

For any node u of the graph G, we define Zeros(u) to be the set of variables 
that the node u "remembers" to be 0, and Ones{u) to be the set of variables that 
the node u "remembers" to be 1, that is, 

Zeros (u) = the set of variables that are evaluated to by every path p 
from the root to u. 

Ones(u) = the set of variables that arc evaluated to 1 by every path p 
from the root to u. 

Note that for any u, the two sets Zeros(u) and Ones(u) are disjoint. 



Resolution Lower Bounds for the Weak Pigeonhole Principle 



689 



The following proposition gives the connection between the sets Zeros(u), 
Ones{u) and the literals appearing in the clause u. The proposition is particularly 
interesting when u is a leaf of the graph. 

Proposition 1 Let f be an unsatisfiable CNF formula and let G be (the graph 
representation of) a Resolution refutation for f. Then, for any node u of G and 
for any Xi, if the literal Xi appears in the clause u then Xi £ Zeros(u), and if the 
literal -iXi appears in the clause u then Xi £ Ones(u). 

6. The weak pigeonhole principle 

The Pigeonhole Principle (PHP) is probably the most widely studied tautology 
in propositional proof theory. The tautology PHP n is a DNF encoding of the 
following statement: There is no one to one mapping from n + 1 pigeons to n holes. 
The Weak Pigeonhole Principle (WPHP) is a version of the pigeonhole principle 
that allows a larger number of pigeons. The tautology WPHP™ (for m > n + 1) is 
a DNF encoding of the following statement: There is no one to one mapping from 
m pigeons to n holes. For m > n + 1, the weak pigeonhole principle is a weaker 
statement than the pigeonhole principle. Hence, it may have much shorter proofs 
in certain proof systems. 

The weak pigeonhole principle is one of the most fundamental combinatorial 
principles. In particular, it is used in most probabilistic counting arguments and 
hence in many combinatorial proofs. Moreover, as observed by Razborov, there 
are certain connections between the weak pigeonhole principle and the problem of 
P 7^ NP [12]. Indeed, the weak pigeonhole principle (with a relatively large number 
of pigeons) can be interpreted as a certain encoding of the following statement: 
There are no small DNF formulas for SAT (where SAT is the satisfiability problem) . 
Hence, in most proof systems, a short proof for certain formulations of the statement 
"There are no small formulas for SAT" can be translated into a short proof for the 
weak pigeonhole principle. That is, a lower bound for the length of proofs for the 
weak pigeonhole principle usually implies a lower bound for the length of proofs for 
certain formulations of the statement P ^ NP. While this doesn't say much about 
the problem of P ^ NP, it does demonstrate the applicability and relevance of the 
weak pigeonhole principle for other interesting problems. 

Formally, the formula WPHP™ is expressed in the following way. The under- 
lying Boolean variables, Xij, for 1 < i < m and 1 < j < n, represent whether or not 
pigeon i is mapped to hole j. The negation of the pigeonhole principle, ^WPHP™ , 
is expressed as the conjunction of m pigeon clauses and (™) • n hole clauses. For 
every 1 < i < to, we have a pigeon clause, 

(x it i V ... V x it n), 

stating that pigeon i maps to some hole. For every 1 < i\ < 12 < m and every 
1 < j < n i we have a hole clause, 



{~~' x ii,j V ~^ x ii ,3)1 
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stating that pigeons i\ and ii do not both map to hole j. We refer to the pigeon 
clauses and the hole clauses also as pigeon axioms and hole axioms. Note that 
-iWPHP™ is a CNF formula. 

Let G be (the graph representation of) a Resolution refutation for -^WPHP™ . 
Then, by Proposition 1, for any leaf u of the graph G, one of the following is satisfied: 

1. u is a pigeon axiom, and then for some 1 < i < to, the variables Xi t i, . . . , 
are all contained in Zeros (u). 

2. ii is a hole axiom, and then for some 1 < j < n, there exist two different 
variables Xi lt j,Xi 2 j in Ones(u). 

7. Resolution lower bounds for the weak pigeon- 
hole principle 

There arc trivial Resolution proofs (and Regular Resolution proofs) of length 
2" -poly(n) for the pigeonhole principle and for the weak pigeonhole principle. In a 
seminal paper, Haken proved that for the pigeonhole principle, the trivial proof is 
(almost) the best possible [7] . More specifically, Haken proved that any Resolution 
proof for the tautology PHP n is of length 2 n ( n \ Haken's argument was further 
developed in several other papers (e.g., [18, 1, 4]). In particular, it was shown that 
a similar argument gives lower bounds also for the weak pigeonhole principle, but 
only for small values of to. More specifically, super-polynomial lower bounds were 
proved for any Resolution proof for the tautology WPHP™, for m < c ■ n 2 /logn 
(for some constant c) [5]. 

For the weak pigeonhole principle with large values of to, there do exist Res- 
olution proofs (and Regular Resolution proofs) which are much shorter than the 
trivial ones. In particular, it was proved by Buss and Pitassi that for to > C V« log n 
(for some constant c), there are Resolution (and Regular Resolution) proofs of 
length poly(m) for the tautology WPHP™ [3]. Can this upper bound be further 
improved ? Can one prove a matching lower bound ? A partial progress was made 
by Razborov, Wigderson and Yao, who proved exponential lower bounds for Reg- 
ular Resolution proofs, but only when the Regular Resolution proof is of a certain 
restricted form [17]. 

The weak pigeonhole principle with large number of pigeons has attracted a 
lot of attention in recent years. However, the standard techniques for proving lower 
bounds for Resolution failed to give lower bounds for the weak pigeonhole principle. 
In particular, for to > n 2 , no non-trivial lower bound was known until very recently. 

In the last two years, these problems were completely solved. An exponential 
lower bound for any Regular Resolution proof was proved in [8] , and an exponential 
lower bound for any Resolution proof was finally proved in [9]. More precisely, it was 
proved in [9] that for any to, any Resolution proof for the weak pigeonhole principle 
WPHP™ is of length 17(2™'), where e > is some global constant (e w 1/8). 

The lower bound was further improved in several results by Razborov. The 
first result [13] presents a proof for an improved lower bound of 0(2™ ), for e = 1/3. 
The second result [14] extends the lower bound to an important variant of the 
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pigeonhole principle, the so called weak functional pigeonhole principle, where we 
require in addition that each pigeon goes to exactly one hole. The third result [15] 
extends the lower bound to another important variant of the pigeonhole principle, 
the so called weak functional onto pigeonhole principle, where we require in addition 
that every hole is occupied. 

For a recent survey on the propositional proof complexity of the pigeonhole 
principle, see [16]. 

8. Lower bounds for P / NP 

Propositional versions of the statement P ^ NP were introduced by Razborov 
in 1995 [10] (see also [11]). Razborov suggested to try to prove super-polynomial 
lower bounds for the length of proofs for these statements in stronger and stronger 
propositional proof systems. This was suggested as a step for proving the hardness 
of proving P ^ NP. The above mentioned results for the weak pigeonhole principle 
establish such super-polynomial lower bounds for Resolution. 

Let g : {0, l} d — ► {0,1} be a Boolean function. For example, we can take 
g = SAT, where SAT : {0, l} d — ► {0, 1} is the satisfiability function (or we can 
take any other ./VP-hard function). We assume that we are given the truth table 
of g. Let t < 2 d be some integer. We think of t as a large polynomial in d, say 
t = d lmo . 

Razborov suggested to study propositional formulations of the following state- 
ment (in the variables Z): 

Z is (an encoding of) a Boolean circuit of size t =>■ 
Z does not compute the function g. 

Note that since the truth table of g is of length 2 d , a propositional formulation of 
this statement will be of length at least 2 d , and it is not hard to see that there 
are ways to write this statement as a DNF formula of length 2°^ (and hence, its 
negation is a CNF formula of that length). The standard way to do that is by 
including in Z both, the (topological) description of the Boolean circuit, as well as 
the value that each gate in the circuit outputs on each input for the circuit. 

In [12], Razborov presented a lower bound for the degree of Polynomial Cal- 
culus proofs for the weak pigeonhole principle, and used this result to prove a lower 
bound for the degree of Polynomial Calculus proofs for a certain version of the above 
statement. Following this line of research, it was proved in [9, 15] (in a similar way) 
that if t is a large enough polynomial in d (say t — d lmo ) then any Resolution proof 
for certain versions of the above statement is of length super-polynomial in 2 d , that 
is, super-polynomial in the length of the statement. 

In particular, this can be interpreted as a super-polynomial lower bound for 
Resolution proofs for certain formulations of the statement P ^ NP (or, more 
precisely, of the statement NP <£ P/poly). 

It turns out that the exact way to give the (topological) description of the 
circuit is also important in some cases. This was done slightly differently in [9] 
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and in [15]. In [9], Z was used to encode a Boolean circuit of unbounded fan- 
in, whereas [15] considered Boolean circuits of fan-in 2. It turns out that for the 
stronger case of unbounded fan-in, the lower bound for the weak pigeonhole principle 
is enough [9] , whereas for the weaker case of fan-in 2 one needs the lower bound for 
the weak functional onto pigeonhole principle [15] (in fact, this was one of the main 
motivations to consider the onto functional case). Otherwise, the proof seems to be 
quite robust in the way the Boolean circuit is encoded. 

Acknowledgement. I would like to thank Toni Pitassi for very enjoying collabo- 
ration that lead to the results in [8, 9]. 
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